Search CORE

26 research outputs found

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Author: Fathullah Yassir
Fuegen Christian
Jia Junteng
Kalinli Ozlem
Lakomkin Egor
Mahadeokar Jay
Seltzer Mike
Shangguan Yuan
Wu Chunyang
Publication venue
Publication date: 12/11/2023
Field of study

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of LLM capabilities, without using any carefully curated paired data. The proposed model can utilize audio prompts as a replacement for text and sustain a conversation. Such a model also has extended cross-modal capabilities such as being able to perform speech question answering, speech translation, and audio summarization amongst many other closed and open-domain tasks. This is unlike prior approaches in speech, in which LLMs are extended to handle audio for a limited number of pre-designated tasks. Experiments show that our end-to-end approach is on par with or outperforms a cascaded system (speech recognizer + LLM) in terms of modeling the response to a prompt. Furthermore, unlike a cascade, our approach shows the ability to interchange text and audio modalities and utilize the prior context in a conversation to provide better results

arXiv.org e-Print Archive

Gendered nationalism : the gender gap in support for the Scottish National Party

Author: Bennie Lynn
Burns Nancy
Campbell Angus
Campbell Rosie
Cronin Mike
Denver David
Johns Robert
Lovenduski Joni
Mayer Tamar
Miller William
Moore Danna L.
Paterson Lindsay
Seltzer Richard A.
Studlar Donley
Stychin Carl F.
Widfeldt Anders
Publication venue: 'SAGE Publications'
Publication date: 01/01/2012
Field of study

Recent major surveys of the Scottish electorate and of Scottish National Party (SNP) members have revealed a distinct gender gap in support for the party. Men are markedly more likely than women to vote for the SNP and they comprise more than two-thirds of its membership. In this article, we use data from those surveys to test various possible explanations for the disproportionately male support for the SNP. While popular accounts have focused on the gendered appeal of recent leaders and on the party’s fluctuating efforts at achieving gender equality in its parliamentary representation, we find much stronger support for a different explanation. Women are less inclined to support and to join the SNP because they are markedly less supportive of its central objective of independence for Scotland. Since men and women barely differ in their reported national identities, the origins of this gender gap in support for independence presents a puzzle for further research

University of Essex Research Repository

Crossref

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Metropolitan Briefing Book, 2007

Author: Dill Jennifer
Houck Mike
Hough George C., Jr
Koski Amy
Labbe Jim
Martin Sheila
Novick Steve
Seltzer Ethan
Tapogna John
Wollner Craig
Publication venue: PDXScholar
Publication date: 01/01/2007
Field of study

The Institute of Portland Metropolitan Studies (IMS) was created to connect the resources of higher education to the needs of the six-county, bit-state Portland-Vancouver metropolitan area (Clackamas, Clark, Columbia, Multnomah, Washington, and Yamhill Counties). In this spirit, we offer our 2007 Metropolitan Briefing Book. Our theme is regional variety. Variety has been touted as the very spice of life (William Cowper) and as the mother of enjoyment (Vivan Grey). Our region enjoys a good deal of variety--in its landscapes, in its economy, and in its people, their cultures, and their attitudes. These differences are important to local vitality and beauty. But while we generally view this variety as positive, we also worry about equity. Although we promote regional thought and action, we must understand that each community experiences the problems facing us in a slightly different way and often with significantly different resources

PDXScholar (Portland State University)

Prompting Large Language Models with Speech Recognition Abilities

Author: Fathullah Yassir
Fuegen Christian
Guo Jinxi
Jia Junteng
Kalinli Ozlem
Lakomkin Egor
Li Ke
Mahadeokar Jay
Seltzer Mike
Shangguan Yuan
Wu Chunyang
Xiong Wenhan
Publication venue
Publication date: 21/07/2023
Field of study

Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio

arXiv.org e-Print Archive

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Author: Chandra Vikas
Dalmia Ayushi
Fathullah Yassir
Jia Junteng
Kalinli Ozlem
Krishnamoorthi Raghuraman
Lei Xin
Li Danni
Mahadeokar Jay
Seltzer Mike
Shangguan Yuan
Wang Dilin
Wu Chunyang
Yang Haichuan
Publication venue
Publication date: 05/09/2023
Field of study

Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.Comment: Meta AI; Submitted to ICASSP 202

arXiv.org e-Print Archive

The First Provenance Challenge

The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions

Southampton (e-Prints Soton)

A Meta-Regression Analysis to Evaluate the Effects of Narasin on Grow-Finish Pig Performance

Author: Arentson Roger A.
Becker Larissa L.
DeRouchey Joel M.
Gebhardt Jordan T.
Goodband Robert D.
Puls Christopher L.
Seltzer Jenna A.
Shields Michael
Tokach Mike D.
Woodworth Jason C.
Publication venue: New Prairie Press
Publication date: 01/01/2023
Field of study

A meta-regression analysis was conducted to evaluate the effects of added narasin in growing-finishing pig diets to predict the influence on average daily gain (ADG), feed efficiency (G:F), and carcass yield. A database was developed containing 21 technical reports, abstracts, and refereed papers from 2012 to 2021 representing 35 observations for growth performance data in studies ranging from 35 to 116 days in length (overall data). In addition, within these 35 observations, individual period data were evaluated (143 observations) using weekly, bi-weekly, or monthly performance intervals (period data). Regression model equations were developed, and predictor variables were assessed with a stepwise manual forward selection procedure. Important variables in predicting the response to added narasin included ADG, average daily feed intake (ADFI), and G:F of the control pigs, feeding duration (shorter or longer than 65 days) and body weight (greater than or less than 230 lb). Using median values from the database for predictor variables, the meta-analysis indicated narasin would be expected to improve ADG between 1.06 to 1.65%, G:F between 0.71 to 1.71%, and carcass yield by 0.31% when fed for longer than 65 days

Kansas State University